feat: vulnerability scanning within git integration (IN-956)#3892
feat: vulnerability scanning within git integration (IN-956)#3892
Conversation
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
related: linuxfoundation/insights#1725 |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
1 similar comment
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
...git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py
Show resolved
Hide resolved
...es/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner.go
Show resolved
Hide resolved
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
2 similar comments
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/main.go
Show resolved
Hide resolved
...git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py
Show resolved
Hide resolved
| conn = await asyncpg.connect( | ||
| user=os.environ["INSIGHTS_DB_USERNAME"], | ||
| password=os.environ["INSIGHTS_DB_PASSWORD"], | ||
| database=os.environ["INSIGHTS_DB_DATABASE"], | ||
| host=os.environ["INSIGHTS_DB_WRITE_HOST"], | ||
| port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")), | ||
| ) | ||
| try: |
There was a problem hiding this comment.
In case of connection errors, it won't be caught, shouldn't we include it in the try block ?
| conn = await asyncpg.connect( | |
| user=os.environ["INSIGHTS_DB_USERNAME"], | |
| password=os.environ["INSIGHTS_DB_PASSWORD"], | |
| database=os.environ["INSIGHTS_DB_DATABASE"], | |
| host=os.environ["INSIGHTS_DB_WRITE_HOST"], | |
| port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")), | |
| ) | |
| try: | |
| try: | |
| conn = await asyncpg.connect( | |
| user=os.environ["INSIGHTS_DB_USERNAME"], | |
| password=os.environ["INSIGHTS_DB_PASSWORD"], | |
| database=os.environ["INSIGHTS_DB_DATABASE"], | |
| host=os.environ["INSIGHTS_DB_WRITE_HOST"], | |
| port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")), | |
| ) |
...git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Adds an automated vulnerability-scanning step to the git integration worker, implemented as a Go OSV-Scanner-based binary invoked from Python, with results persisted to the insights DB.
Changes:
- Run a new
VulnerabilityScannerServiceon the first clone batch and record an execution viaOperationType.VULNERABILITY_SCAN. - Introduce a new Go-based
vulnerability-scannermodule/binary (OSV Scanner SDK) plus Docker build plumbing. - Extend
run_shell_commandto propagate return codes and optionally stream stderr.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| services/apps/git_integration/src/crowdgit/worker/repository_worker.py | Invokes vulnerability scan on first clone batch. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py | Python wrapper for scanner subprocess + execution tracking + stale scan cleanup. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner.go | Core Go scanning logic + normalization + DB persistence. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/types.go | Shared response / DB model types for scanner. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/main.go | CLI entrypoint + JSON stdout formatting. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/go.mod | Go module definition for scanner. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/go.sum | Go dependency lockfile for scanner. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/db.go | Insights DB connection + upsert/resolve strategy + scan tracking. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/config.go | Reads target path + insights DB env configuration. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/README.md | Design/behavior documentation for scanner component. |
| services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/.gitignore | Ignores local Go build artifacts. |
| services/apps/git_integration/src/crowdgit/services/utils.py | Adds stderr streaming + returncode propagation in run_shell_command. |
| services/apps/git_integration/src/crowdgit/services/init.py | Exposes VulnerabilityScannerService. |
| services/apps/git_integration/src/crowdgit/server.py | Wires scanner service into app lifecycle / worker init. |
| services/apps/git_integration/src/crowdgit/errors.py | Adds returncode field to CommandExecutionError. |
| services/apps/git_integration/src/crowdgit/enums.py | Adds OperationType.VULNERABILITY_SCAN. |
| scripts/services/docker/Dockerfile.git_integration | Builds + ships vulnerability-scanner binary in the image. |
| backend/.env.dist.local | Adds local insights DB env vars. |
| backend/.env.dist.composed | Adds composed insights DB host env var. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
services/apps/git_integration/src/crowdgit/services/vulnerability_scanner/README.md
Show resolved
Hide resolved
...git_integration/src/crowdgit/services/vulnerability_scanner/vulnerability_scanner_service.py
Outdated
Show resolved
Hide resolved
| conn = await asyncpg.connect( | ||
| user=os.environ["INSIGHTS_DB_USERNAME"], | ||
| password=os.environ["INSIGHTS_DB_PASSWORD"], | ||
| database=os.environ["INSIGHTS_DB_DATABASE"], | ||
| host=os.environ["INSIGHTS_DB_WRITE_HOST"], | ||
| port=int(os.environ.get("INSIGHTS_DB_PORT", "5432")), |
| config.InsightsDatabase.User = os.Getenv("INSIGHTS_DB_USERNAME") | ||
| config.InsightsDatabase.Password = os.Getenv("INSIGHTS_DB_PASSWORD") | ||
| config.InsightsDatabase.DBName = os.Getenv("INSIGHTS_DB_DATABASE") | ||
| config.InsightsDatabase.Host = os.Getenv("INSIGHTS_DB_WRITE_HOST") | ||
| if portStr := os.Getenv("INSIGHTS_DB_PORT"); portStr != "" { | ||
| if port, err := strconv.Atoi(portStr); err == nil { | ||
| config.InsightsDatabase.Port = port | ||
| } | ||
| } | ||
| config.InsightsDatabase.SSLMode = os.Getenv("INSIGHTS_DB_SSLMODE") | ||
| if poolMaxStr := os.Getenv("INSIGHTS_DB_POOL_MAX"); poolMaxStr != "" { | ||
| if poolMax, err := strconv.Atoi(poolMaxStr); err == nil { | ||
| config.InsightsDatabase.PoolMax = poolMax | ||
| } | ||
| } |
|
|
5f955ac to
9b7b58e
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 6 total unresolved issues (including 4 from previous reviews).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
services/apps/merge_suggestions_worker/src/memberSimilarityCalculator.ts
Show resolved
Hide resolved
Signed-off-by: anil <epipav@gmail.com>
9b7b58e to
706ee5c
Compare
Signed-off-by: anil <epipav@gmail.com>

Adds automated vulnerability scanning for all git repositories using the Google OSV Scanner SDK. Runs on the first clone batch per repo and persists results directly to the insights database.
Architecture
Go binary wrapped in Python — OSV Scanner is a Go library with no Python bindings. We embed it as an SDK dependency and call it programmatically, following the same subprocess + JSON stdout pattern as the software-value service.
The binary exits with code 0 and communicates errors through the JSON payload, so the Python subprocess machinery never misinterprets a non-zero exit as a crash.
Design decisions
Vulnerability identity: (repo_url, vulnerability_id, package_name, source_path) — same CVE can appear in multiple packages and lockfiles
ID classification: primary ID + aliases sorted into cve_ids, ghsa_ids, other_ids arrays by prefix
Severity: derived from CVSS numeric score using standard thresholds (CRITICAL/HIGH/MEDIUM/LOW)
Status tracking: OPEN (no fix known), FIX_AVAILABLE (patch exists), RESOLVED (no longer detected)
Database strategy: upsert + mark-resolved (not delete + insert) — preserves full history of when vulnerabilities were first detected, last seen, and resolved
Transitive scanning: resolves full dependency graph by default; falls back to direct-only on timeout (3min) for first scans; subsequent scans reuse the previous mode
OOM handling: on any scanner crash, marks stale running scan records as failure; on OOM specifically (SIGKILL), retries with --no-transitive to skip the most memory-intensive part
Scan tracking: every invocation creates a vulnerability_scans row (running → success/failure/no_packages_found) with duration, counts, and errors
Note
Medium Risk
Adds a new repo-processing stage that shells out to a new Go scanner binary and writes/upserts vulnerability data into the insights DB, plus new analytics datasources/pipes; failures/timeouts/OOM handling could impact worker throughput and DB load.
Overview
Adds automated vulnerability scanning to git integration. The repository worker now runs a new
VulnerabilityScannerServiceon the first clone batch, tracking execution viaOperationType.VULNERABILITY_SCAN.A new Go binary (
vulnerability-scanner) is built into the git-integration Docker image and invoked viarun_shell_command(now supports real-time stderr streaming and propagates subprocessreturncode). The scanner creates/finalizesvulnerability_scans, upsertsvulnerabilitieswith a resolve+upsert strategy, supports transitive dependency scanning with a--no-transitivefallback on timeout/OOM, and reads newINSIGHTS_DB_*env vars.Adds Tinybird
vulnerabilities/vulnerability_scansdatasources and summary/list/breakdown pipes to query vulnerability counts by severity/ecosystem and last scan status.Written by Cursor Bugbot for commit 0ea8a21. This will update automatically on new commits. Configure here.